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ABSTRACT 

Despite the increasing criticism of statistical 
significance testing by researchers, particularly in the publication 
of the 1994 American Psychological As s oc iat i on ’ s style manual, 
statistical significance test results are still popular in journal 
articles. For this reason, it remains important to understand the 
logic of inferential statistics. A fundamental concept in inferential 
statistics is the sampling di s t r ibut i on . This paper explains the 
sampling distribution and the Central Limit Theorem and their role in 
statistical significance testing. Included in the discussion is a 
demonstration of how computer applications can be used to teach 
student? \bout the sampling distribution. The paper concludes with an 
example of hypothesis testing and an explanation of how the standard 
deviation of the sampling distribution is either calculated based on 
statistical assumptions or is empirically estimated using logics such 
as the ’’bootstrap.^* These concepts are illustrated through the use of 
hand generated and computer examples. An appendix displays five 
computer screens designed to teach these topics. (Contains 1 table, 4 
figures, and 20 references.) (Author/SLD) 
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Abstract 

Despite increasing criticism of statistical significance testing by 
researchers, particularly in the publication of the 1994 American 
Psychological Association’s style manual, statistical significance test results 
are still popular in journal articles. For this reason, it remains important to 
understand the logic of inferential statistics. A fundamental concept in 
inferential statistics is the sampling distribution. This paper explains the 
sampling distribution and the Central Limit Theorem and their role in 
statistical significance testing. These concepts are illustrated through the 
u:je of hand generated and computer examples. 
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Understanding the Sampling Distribution and its 
Use in Testing Statistical Significance 
In recent years, statistical significance testing has been increasingly 
criticized by researchers. In fact, the Journal of Experimental Education has 
an entire issue dedicated to a discussion of statistical significance testing 
(Thompson, 1993a). Articles within the journal provide explanations of 
what statistical significance testing actually doe-^ and why people have 
persisted in using it (Shaver, 1993, p. 293). In addition, they present the 
reader with alternatives to statistical significance testing (Thompson, 

1993b) or at a minimum suggest that effect size should be reported along 
with statistical significance (Carver, 1993). According to Thompson 
(1994b), as scientists, the questions that should be of concern when 
engaging in statistical significance testing are “(a) what the magnitude of 
sample effects are and (b) whether these results will generalize.” (p. 6) 
Unfortunately, statistical significance testing does not answer either of 
these questions (Thompson, 1994a). 

Despite the concerns raised about statistical significance testing 
by researchers, and the fact that the Publication Manual of the American 
Psychological Association (1994, p. 18) itself alerts the researcher of the 
limitations of statistical significance testing and encourages one to provide 
effect size information, statistical significance test results are still popular 
in journal articles. For this reason, it remains important to understand the 
logic of statistical significance testing. 

The purpose of this paper is to explain the sampling distribution 
w'hich is one of the fundamental concepts underlying all inferential 
procedures (Chalmer, 1987; Freund & Smith, 1986; Hinkle, Wiersma, & 
Jiirs, 1994; Mohr, 1990). A definition and explanation of the sampling 
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distribution and its relation to statistical significance testing will be 
provided. Included in the discussion is a demonstration of how computer 
applications can be used to teach students about the sampling distribution 
and the Central Limit Theorem. The paper concludes with an example of 
hypothesis testing and an explanation of how the standard deviation of the 
sampling distribution is either calculated based on statistical assumptions, 
or is empirically estimated using logics such as the ‘'bootstrap”. 

Chain of Reasoning in Inferential Statistics 
When conducting statistical significance testing, the researcher is 
trying to infer something from the sample being observed. This is why 
statistical significance testing is called inferential statistics. Thus, there are 
generally two tasks of inferential statistics. The first task is to test 
hypotheses about parameters. The second task is to use statistics 
(descriptive measures of a sample) to make statements about or to 
estimate parameters (descriptive measures of a population). The 
parameters are unknown and that is why inferences need to be made 
about them (Chalmer, 1987; Hinkle et al., 1994). For example, if a 
representative sample of undergraduate and graduate students at a major 
university spend an average of two hours per day during a semester in the 
student center, we might correctly infer that all students at the university 
spend approximately two hours per day per semester in the student 
center. 

Hinkle et al. (1994) describe a chain of reasoning for inferential 
statistics which is illustrated in Figure 1. They state that the first step in 
inferential statistics is to draw a randomly selected (c. at least a 
representative) sample. A randomly selected sample is one in which 
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‘‘...every member of the population has an equal chance of being selected.” 
(Mattson, 1981, p. 75). 

The sample needs to be a random sample because we are trying to 
make inferences about the population from the sample. If the sample is 
not randomly selected we may be introducing systematic bias into the 
sample, which can be either intentional or unintentional. A biased sample 
would not give us accurate information about the population and the 
population is what we are interested in (Mattson, 1981). For example, if 
you wanted a law to be passed that only English could be spoken in the 
classroom, you might intentionally choose to sample only those people that 
you knew did not support bilingual education. Thus, your sample results 
would make it appear that the majority of people in the United States 
supported your position and the law would be passed. Unintentional 
systematic bias could exist if you decided to sample your population by 
taking the first 200 people listed in the phone book. In this case, there 
would be many sources of potential bias, such as you’re only accessing 
people who have telephones or who are listed in the telephone directory. 

According to Hinkle et al. (1994), the second step in the chain of 
reasoning for inferential statistics is that ‘‘...the estimate from this sample 
must be compared to an underlying distribution of estimates from all other 
samples of the same size that might be selected from the population” (p. 
147). An underlying distribution can be defined as, ‘‘...the distribution of all 
possible outcomes of a particular event” (Hinkle et al., 1994, p. 138). 

The third step in inferential statistics involves making inferences 
based on the comparison and probability of the sample with statistics with 
the underlying distribution of the statistic when random sampling has 
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been employed (Hinkle et al., 1994). The sampling distribution is this 
underlying distribution of the statistic (Hinkle, et al., 1994, p. 149). 

The Sampling Distribution 

A formal definition of a sampling distribution provided by Hinkle et 
al. (1994), “...is the distribution of all values of the statistic under 
consideration, from all possible random samples” (p. 149). The sampling 
distribution most commonly seen in textbooks is the sampling distribution 
of the mean; however, the reader should be aware that you can have a 
sampling distribution of any sta istic such as the sampling distribution of 
the median or standard deviation. 

Sampling distributions can be derived either empirically or 
theoretically. Most sampling distributions of a statistic have already been 
established theoretically; however, to understand the concept of a 
sampling distribution it is useful to demonstrate empirical methods of 
deriving these distributions (Matts<^ ti, 1981). The en oirical methods can 
consist of hand calculations or, if this would be too lengthy of a process, 
which is often the case, computer applications can be utilized. 

To illustrate the concept of a sampling distribution using hand 
calculations, consider constructing the sampling distribution for the mean 
of a random sample of size n = 2, from the finite population of size N. = 5. 
The elements of the population will be the numbers 2, 4, 6, 8, and 10. The 
mean of the population is: 

[1 = 2 + 4 + 6 + 8 + 10 = 5 
5 



an d 
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o =^ 



(2 - 6)^+ (4 - 6)^ + (6-6f +(8-6)^ + (10"6r 



=/8 

= 2.8 



Taking random samples of n = 2 from the population, there are 10 
equally probable possibilities; 

2 and 4, 2 and 6, 2 and 8, 2 and 10, 4 and 6 
4 and 8, 4 and 10, 6 and 8, 6 and 10, 8 and 10 
The mean of the first sample is; (2 +4)/2 = 3. The remainder of the means 
for each sample may also be calculated, yielding the following values; 4, 5, 

6, 5, 6, 7, 7, 8, and 9. If sampling is random, so that each sample statistic 
has the probability 1/10 (each outcome [1] divided by the number of 
equally likely outcomes [10]), the sampling distribution of the mean would 
be as shown in Table 1. 

This example illustrates two important points. First, the mean of the 
sampling distribution equals the mean of the population, which equals 
p. = 6. In addition, the standard deviation of the sampling distribution of 
the mean is smaller than the standard deviation of the population, SDM_ = 
1.73 < c = 2.8 (Hinkle et al., 1994). In this example, = 1.73 k the 
standard enor of the mean, i.e., the standard deviation of the sampling 
distribution. 

This was just one example with a very small sample and a very small 
population. You can create many such examples yourself, and you will see 
that the expected value of all possible values of the sample mean from 
random samples of si^e n. equals the mean of the population. That is, = 

|t. In addition, the standard error of the mean (standard deviation of the 
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sampling distribution of the statistic) is always less than or equal to the 
standard deviation of the parent population (Moore & McCabe, 1989). 

Upon closer examination, one sees that the standard error of the 
mean increases as the variability of the population increases and decreases 
as the sample size increases. Thus, the standard error of the statistic is 
directly proportional to the standard deviation of the population. The 
formula for the standard error of the mean is (Freund & Smith, 1986, 
p.274): 



SDm = o/ -/n 

Central Limit Theorem 

Researchers do not normally calculate sampling distributions but 
instead use theoretical sampling distributions which are defined by 
mathematical theorems. One such theorem is the Central Limit Theorem 
(CLT). The CLT states that: given a population with a mean equal to \i and 
variance equal to a^, as sample size fn) increases, the sampling distribution 
of the mean for simple random samples of n cases will approach the 
normal distribution (Hinkle et al., 1994; Howell, 1987). Thus, as is 
illustrated in Figure 2, with very small samples the shape of the 
distribution will depend on the shape of the parent population, but with 
samples of n. = 30, even a skewed parent population will result in a normal 
distribution (Harnett, 1975; Hinkle et ah, 1994). Schulman (1992) refers to 
this phenomenon as the “magic of the normal distribution” (p. 19) and 
states that without this magic, most of statistics would be limited to 
applications where it could be demonstrated that the population had a 
normal distribution. 
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In order to discuss the CLT, the concept of the normal probability 
distribution must be understood. The normal probability distribution has 
three properties: 

1. A normal distribution histogram is unimodal (has one mode or 

peak) and it is symmetrical (i.e., the part of the curve to the right of the 

mean is a mirror image of the part to the left). It’s coefficients of skewness 

and kurtosis are both zero ( Bump, 1991). 

2. The normal distribution is continuous. This means that for every 

value of jc there is a value for y and the total area underneath the curve is 

equal to 100 percent (Chalmer, 1987; Hinkle et al., 1994). 

3. “The normal distribution is asymptotic to the X axis” (Hinkle et al., 
1994, p. 88). The farther away from the mean the curve is, the more the 
curve approaches the X axis without actually ever touching it (Hinkle et al., 
1994; Mittag, 1992). 

Examples 

Hand Calculated Example 

To illustrate the concept of the CLT, we can refer to the example used 
above for the sampling distribution. As a reminder the population 
consisted of the numbers 2, 4, 6, 8, and 10. A histogram of the population 
is illustrated in Figure 3. As can be seen from the histogram the population 
is not a normal distribution. Now if we look at a histogram of the sampling 
distribution, which is illustrated in Figure 4, we see that it is approaching a 
normal distribution. 

Com puter Examples 

Computer programs have greatly advanced the teaching of statistical 
concepts (Freund & Smith, 1986; Mitaag, 1992; Schulman, 1992; Yang & 
Robinson, 1986). To better illustrate the concepts of the sampling 
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distribution and CLT, a computer application can be used. The computer 
application to be used in this paper was developed by James Lang. 

HyperCard version 2.0 is needed to run the program on a Macintosh 
computer. Samples of the program are provided in the Appendix. In the 
first example shown in the Appendix, 200 samples were taken of a sample 
size of n. = 10. The mean of the population (p.) was 4.5 with a standard 
deviation (a) of 2.87. The mean of the sampling distribution was 4.46 

and the standard error of the mean ( SDM) was .88. This is consistent with 
CLT because the standard error of the mean should have equaled 2.87/«/To 
and it did. 

In the second example, 200 samples were taken with a sample size 
of n = 20. The population parameters remain the same (p = 4.5, o= 2.87) 
but the mean of the sampling distribution (l^M ) was 4.44 and the standard 
enor of the mean ( SDmJ was .64. Again, this result is consistent witi CLT. 
The standard error of the mean ( SDM') should have equaled 2.87//20 and it 
did. 

In the third example, various sample sizes (n}. were chosen and then 
the computer took 500 samples of the selected sample sizes (n) from the 
population. Means and standard deviations of the sets of sample means 
were generated. This example demonstrates the relationship of the mean 
(^M.) and standard deviation of the sample means ( SDM) to the mean (p.) 
and standard deviation (a) of the population. As sample size increases, the 
mean (i-^M ) better approximates the population mean (p.). At sample size n 
= 2, the mean of t'ue sample means was 4.34 and the population mean was 
4.5. At sample size n = 100, the mean of the sample means was 

4.50104. As one would expect, the standard eixor of the means also 
decreased as sample size increased. At sample size n = 2, the standard 
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error of the mean (SDm.) was 2.11. Whereas, at sample size n = 100, the 
standard error of the mean ( SOMl was .28. 

CLT and Statistical Significance Testing 

Overview 

How do the concepts of the sampling distribution and the CLT relate 
to statistical significance testing? As was stated at the beginning of the 
paper, in statistical significance testing, the researcher is trying to make 
inferences about the population based on a random sample drawn from 
the population. When this random sample is drawn and a statistic such as 
the mean (M ) is computed, the statistic represents both the parameter of 
the population and sampling error. Statistical significance testing involves 
determining the magnitude of the difference between the statistic and the 
hypothesized value of the parameter. Once the researcher determines the 
magnitude of the difference, he/she makes a judgment as to whether this 
difference is “statistically significant” or not. In otherwords, the researcher 
decides to either reject or fail to reject the null hypothesis (Hinkle et al., 
1994). 

Steps in Hypothesis Testing 

In order to better understand the role of the sampling distribution in 
statistical significance testing, the actual steps of hypothesis testing will be 
summarized and an example will be provided. When engaging in statistical 
significance testing, the researcher first states the null hypothesis. The null 
hypothesis states that their is no relationship or difference (Hinkle et al., 
1994). For example, if it is believed that the mean weight of male college 
professors is 170, the null hypothesis would be; 
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Ho: 1^ = 170. 

The second step in statistical significance testing is to set the 
criterion for rejecting the Hq. In order to do this, the researcher must 
select a level of statistical significance which is the probability of making a 
Type I error. A Type I error is when a researcher rejects a null hypothesis 
that is actually true. The most common levels of significance selected are 
.05 and .01 (Hinkle et al., 1994). According to Hinkle et al. (1994), “The 
level of significance represents a proportion of area in a sampling 
distribution that equals the probability of rejecting the null hypothesis if it 
is true. This area of the sampling distribution is called the region of 
rejection” (p. 171). Using the above example that the mean weight of male 
college professors is 170, if we selected a random sample of n = 144 male 
college professors to test our hypothesis and the sample mean (M}= 166, 
we would have to use the sampling distribution to decide whether the 
difference between the sample mean and the hypothesized population 
mean is large enough to reject the null hypothesis. The sampling 
distribution for this example is the theoretical distribution of all possible 
samples of size n = 144 of male college professors’ mean weight. Due to 
the Central Limit Theorem, since the sample size is reasonably large, we 
know that the distribution of sample means for this example is normal. We 
would state that the mean of the distribution equals the population m.ean 
(|l = 170), and the standard error of the mean ( SD M ) equals 1.67 if the 
population standard deviation (a) is 20. 



standard error of the mean (SJ),Iv|j = oV vTT - 20 / */T44 - 1 .67 
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If the population standard deviation (o) was unknown then the 
sample standard deviations would have to be used to estimate the 
population standard deviation (a). Before the use of computers, researchers 
had to rely on statistical assumptions to calculate the standard error. Now, 
several microcomputer programs exist that allow the researcher to use 
bootstrap logic to estimate the standard error (Reinhardt, 1992). 
Conceptually, bootstrap methods copy the data set over and over again, 
infinitely many times, to create a mega data set. Resampling from the 
original data set with replacement occurs. Thus, large numbers of samples 
are drawn from the mega file with statistics calculated for each new 
sample and then all the statistics are averaged (Thompson, 1993b, p. 369). 
As Reinhardt states (1992), “computer-intensive bootstrap methods can 
provide estimates for the standard error of results by using the actual 
data, rather than relying on the assumption that the sampling error is 
normally distributed...” (p. 15). 

The third step is to compute the test statistic. The formula for the 
test statistic is: 

test statistic = statistic - parameter/standard error of the 

statistic 

In our example: test statistic = (166 - 170)/1.67. Thus, the test 
statistic is equal to -2.4. The test statistic indicates the number of standard 
errors the observed sample statistic (ML) is from the hypothesized 
parameter (p.). This test statistic is then compared to the critical value 
found in the appropriate table. The critical values indicate the beginning 
values for the region of rejection of the sampling distribution. If the test 
statistic exceeds the critical value the null hypothesis is rejected (Hinkle et 
al., 1994). In our example, using the .05 level of significance for a two- 
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tailed test, the critical values are ± U96. Thus, the null hypothesis would be 
rejected because -2.4 exceeds the critical value of -1.96. 

Conclusion 

It is clear that the sampling distribution is a fundamental concept in 
statistical significance testing. Computer applications, such as the one 
illustrated in this paper, can be helpful in understanding the role of the 
sampling distribution and statistical assumptions such as the Central Limit 
Theorem in inferential statistics. In addition, computer-intensive bootstrap 
methods can be used to estimate the standard deviation of the sampling 
distribution when population parameters are unknown, using the actual 
data rather than having to rely on statistical assumptions (Reinhardt, 
1992). 
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Appendix 



K sampling distribution 



The purpose of this program is to let you watch 
the random process that leads to the sampling 
distribution for the sample mean. You may also 
compare the results of the program to the 
theoretical statement called the Central Limit 
Theorem. 

Click on the population below that you want to 
sample. 
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Example 1 
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Example 2 



H sampling distribution 
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Example 3 



H sampling distribution 



mm'i 



Enter a samp'ie si 2 e,n, below and click calculate. The computer will then take 
500 samples each of size n from the population below. For each sample the 
mean is then calculated. Then the mean and standard deviation of this set of 
500 sample means is calculated. These values approximate the mean and 
standard deviation of the sampling distribution of the sample mean. 
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Table 1 

Sampling Distribution of the Mean 
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Figure 1. Chain of reasoning for statistical significance testing. 





Sampling Distribution of the Mean for Sample Size n 




Sampling Distribution of the Mean for Sample Size n 



= 30 



Sampling distributions of the mean for a skewed parent 
population. 







Figure 3, Histogram of the population. 




Figure 4. Histogram of the sampling distribution of the mean. 







